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-Wk  presenl^a  new  domain  decomposed  fast  Poisson  solver  on  a  rectangle  divided  into  parallel 
strips  or  boxes.  The  method  first  performs  uncoupled  fast  solves  on  each  subdomain,  and  then  the 
interface  variables  are  computed  exactly  by  fast  Fourier  transform,  without  computing  or  inverting 
the  capacitance  matrix  explicitly.  Finally,  the  solution  on  the  interior  of  the  subdomains  can  be^ 
computed  by  one  more  fast  solve  on  each  subdomain. 

This  method,  as  opposed  to  others,  does  not  involve  any  iteration  in  the  solution  of  the  system 
for  the  interface  variables.  It  is  especially  suited  for  parallel  implementation,  ance  the  independent 
problems  in  the  subdomains  can  be  solved  in  parallel,  and  the  communication  involves  the  interface 
variables  only.  ^ 
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1.  Introdaction 

We  consider  the  solution  of  the  Poisson  equation  in  a  rectangular  region  partitioned  into  strips. 
By  the  method  of  Domain  Decomposition,  the  solution  of  the  algebrmc  equations  resulting  from 
the  discretization  on  a  regular  grid  is  reduced  to  the  solution  of  problems  in  the  subdomains  and 
a  linear  system  for  the  interface  unknowns,  given  by  the  capacitance  matrix.  This  is  an  important 
tool  in  the  solution  of  elliptic  partial  differential  equations.  There  are  several  reasons  why  these 
techniques  might  be  attractive: 

•  The  method  is  suited  for  the  solution  of  very  large  problems  on  machines  with  limited  storage. 

•  Special  solution  techniques  might  exist  to  solve  the  problems  on  the  subdomains  that  cannot  be 
applied  efficiently  to  the  entire  domain.  This  is  often  the  case,  for  example,  when  the  domain 
has  irregular  geometry,  but  it  can  be  broken  up  into  regular  subdomains,  like  rectangles. 


•  The  equations  in  the  different  subdomains  might  have  different  parameters  or  even  be  of  dif¬ 
ferent  nature,  in  which  case  the  idea  of  substructuring  comes  veiy  naturally. 

•  The  idea  is  attractive  for  parallel  processing,  since  the  problem  can  be  decoupled  in  independent 
subproblems  and  the  communication  needed  will  be  only  for  the  interface  values. 

The  capacitance  matrix  is  expensive  to  compute  or  invert  explicitly,  while  it  is  relatively  simple  to 
compute  the  product  of  such  matrix  with  an  arbitrary  vector.  For  that  reason  the  interface  system 
is  usually  solved  by  conjugate  gradient  methods  instead  of  a  direct  method.  In  order  to  improve 
the  convergence  of  the  method,  several  preconditioning  techniques  have  been  given  in  the  literature 
(6,  7, 1,  2]. 

For  the  problem  considered  in  thb  paper,  we  present  a  fast  direct  method  for  the  solution  to 
the  capacitance  matrix  that  does  not  require  the  computation  of  the  elements  of  the  matrix.  As 
oppos^  to  the  other  methods  given  so  far,  there  is  no  iteration  involved. 

In  the  case  of  uniform  sized  strips,  we  derive  the  eigenvectors  and  eigenvalues  of  the  capacitance 
matrix.  This  gives  not  only  a  direct  method  to  solve  the  interface  system,  but  also  a  framework  to 
analyse  other  preconditioners. 

In  section  2,  we  apply  the  method  of  Domain  Decomposition  to  the  solution  of  the  Poisson 
equation  in  a  rectangular  domain  subdivided  into  strips,  and  derive  a  system  for  the  interface 
unknowns  and  the  capacitance  matrix.  In  section  3,  we  analyze  the  capacitance  matrix,  giving  its 
eigenvectors  and  eigem’alues.  Based  on  this,  we  derive  a  fast  direct  method  for  the  solution  of  the 
interface  system.  The  eigen-decomposition  of  the  capacitance  matrix  also  provides  a  clear  way  of 
analyzing  some  other  preconditioners.  In  section  4,  we  analyze  the  preconditioners  given  by  Golub 
and  Mayers,  Dryja  and  Proskurowski,  and  Bjorstad  and  W'idlund.  Finally,  in  sections  5  and  6,  we 
discuss  extensions  of  the  method  to  rectangular  domains  subdivided  into  boxes,  and  more  general 
variable  coefficient  operators  and  summaryze  the  results,  giving  some  concluding  remarks. 


2.  Domain  Decomposition 

We  consider  the  Poisson  equation 


with  boundary  conditions 


Au  »  /  on  n 


V  s  U(  on  dft 

where  the  domain  H  is  as  illustrated  in  Fig.  1. 


1 


Accesio 

NT'S 

b  ct  .. 

■\  For  , 

,  :.,iJ  □ 

:-’i  □ 

By  . 

Dist.  ibi;tio:i/ 

Availability  Co'ios 

Dial 

j  /!“/ 

Avail  j!  V.;  /  or 

Special 

_ 

(2.1) 


Figure  1:  The  domain  Cl  and  its  partition 

We  partition  O  into  A;  +  1  strips  H,-,  t  =  1, . . . ,  A:  +  1  of  sizes  1  by  and  denote  the  interfaces 
between  fit  and  by  r,-,t  =  1, We  use  a  uniform  mesh  with  grid  size  h  on  Cl  with  n 
internal  grid  points  in  the  z-direction,  i.e. 

— i- 

n+  1 

and  m,-  internal  grid  points  in  O,-  in  the  y-direction,  i.e.  for  t  =  1, . . . ,  ib  +  1, 

li  =  (m,-  +  l)h. 

Let  us  consider  a  standard  5-point  centered  difference  approximation  to  (2.1).  If  we  order  the 
unknowns  for  the  internal  points  of  the  subdommns  first  amd  then  those  in  the  interfaces  F,-,  then 
the  discrete  solution  vector  u  s  (uoiUr)  satisfies  a  linear  system  that  can  be  expressed  in  block 
form  as: 

where  the  right  hand  side  depends  on  /  and  ut  and 


Qr  = 


^2fc+l 


where  L,-  corresponds  to  the  discrete  Laplacian  on  n,-  for  t  <  il:  +  1,  and  £,■  for  t  >  A;  +  1  is  the 
tridiagonal  matrix  tridiag[l,  —4, 1).  The  matrix  P  has  the  following  block  bi-diagonal  form: 
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/A.1  \ 

^2.1  ^2,2 

P  = 

Pkjc-l  Pk,k 


(2.5) 


where  PiJ  corresponds  to  the  coupling  between  the  unknowns  in  the  subdomain  (1,-  with  those  in 
the  interface  Tj. 

By  applying  block  Gaussian  Elimination  to  (2.2),  we  obtain  the  following  system  for  the 
interface  unknowns: 

Cur=g  (2.6) 

where 

gsfr-P'^Qa^h  (2.7) 

and 

C^Qr-P^Qa^P  .  (2.8) 


C  is  sometiuita*  called  the  capacitance  matrix.  Since  Qn  is  block  diagonal,  the  right  hand  nde  of 
(2.6)  given  by  (2.7)  can  be  evaluated  by  solving  a  problem  like  (2.1)  on  each  subdomain  O,-,  with 
zero  Dirichlet  boundary  conditions  on  the  interfaces  r,-_i  and  F,-  ,  t  s  1,...,A;  -f-  1.  Once  (2.6) 
is  solved,  the  problem  is  decoupled  and  the  solution  tin.  the  subdomains  can  be  computed  by 
solving  k  +  1  independent  subproblems: 


=  /Oi  - 

for  t  s  1.  This  b  nothing  more  than  solving  for  tiQi  on  each  subdomain  O,-  with  the 

computed  tiPi.,  and  ur^  as  boundary  conditons. 

Thb  technique  of  reducing  the  problem  on  H  to  the  solution  of  decoupled  problems  on  the 
subdomains  and  a  smaller  system  for  the  interface  b  usually  called  domain  decomposition  or  sub- 
structuring. 


S.  The  capacitance  system 

By  substituting  (2.3),  (2.4)  and  (2.5)  in  (2.8),  we  can  see  that  the  matrix  C  has  the  following 
block  tridiagonal  form: 

/Cl  B2  \ 


C= 

Bk 

Bk  Ck) 

where  C|  b  the  capacitance  matrix  corresponding  to  the  inteface  F,-,  i.e. 


(3.1) 


and 


Ci  —  Irk+l+l  —  Pj^Lf  A+l,l 
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T  r-l 


(3.2) 

(3.3) 
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The  matrix  C  is  expensive  to  compute  explicitly,  since  the  computation  of  the  blocks  C,-  and 
Bi  requires  the  solution  of  2n  subproblems  on  each  H,-.  Also,  the  resulting  blocks  are  dense  n 
by  n  matrices,  and  therefore,  the  solution  to  (2.6)  by  Gaussian  elimination  would  require  O(n^) 
operations,  which  may  overcome  the  O(n^log(n))  complexity  of  the  fast  solvers  for  the  subdomadns, 
when  m  n. 

Instead  of  using  a  direct  method  to  solve  the  system  (2.6),  preconditioned  conjugate  gradient 
(PCG)  methods  can  be  applied,  where  only  matrix-vector  products  of  the  form  Cw  for  a  given 
vector  w  are  needed.  From  (3.2),  we  can  see  that  the  evaluation  of  C,w  for  each  t  requires  the 
solution  of  two  subdomain  problems.  For  example,  the  product  — computed  by 
solving  the  discretized  version  of  (2.1)  on  Oj  with  homogeneous  right  hand  side  (i.e.,  the  Laplace 
equation)  and  boundary  conditions  uswonr2,UB:0on  the  rest  of  the  boundary,  and  then 
taking  the  solution  on  the  first  row  of  grid  points  above  r2.  Similarly,  B2W  corresponds  to  taking 
the  same  solution  on  the  first  row  below  Fi. 

Elach  iteration  of  the  PCG  method  requires  the  solution  of  at  least  one  problem  at  each  subdo- 
mmn.  Therefore,  it  is  important  to  keep  the  number  of  iterations  low.  A  number  of  preconditioners 
for  the  matrix  C  have  been  given  in  the  literature  in  order  to  improve  the  convergence  [6,  7,  1,  2]. 

It  turns  out  that  for  the  problem  we  are  considering,  an  exact  decomposition  for  the  capacitance 
matrix  is  possible.  The  system  can  then  be  solved  directly  by  fast  Fourier  transforms,  and  therefore 
no  preconditioned  conjugate  gradient  iteration  is  necessary.  The  two  strips  case  is  analyzed  by  Chan 
in  [4],  where  the  eigenvectors  and  eigenvalues  of  the  capacitance  matrix  C  are  given  exactly. 

Here  we  extend  that  result  to  the  multbtrip  and  multi-box  cases.  We  will  show  that  the 
matrices  C,-  and  B,-  in  (3.1)  have  the  same  eigenvectors.  The  system  (2.6)  can  thus  be  solved  by 
fast  Fourier  transforms  and  cyclic  reduction  [3,  8]. 

Let  us  define  the  orthogonal  matrix  W  as  the  matrix  whose  columns  are 


Wt 


\l  — ^-“(sin  j5rft,sin  2j9h,  •  •  •  ,sin  njirh)"^ 
V  n  +  1 


(3.4) 


for  j  =  l,...,n  , 


and 


<Tj  =  4sin  2 


(3.5) 

(3.6) 


Then,  we  have  the  following 

T.amins  g.l.  The  matrices  C,-  and  B,  have  the  same  eigenvectors  (3.4).  For  i  —  l,...,k,  we  have 


and  for  t  2, . . . ,  k,  we  have 


W^C,ir-A,  -  diag(A„ . Ain) 


W'^BiW  «  Di  -  diag(6n  ,...,«,„) 


where 


“  ( 1  -  1  -  j  V  ^  4  ’ 


(3.7) 


(3.8) 


(3.9) 
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and 


(3.10) 


Proof.  We  will  compute  the  three  terms  of  CiWj  in  (3.2)  separately.  As  we  stated  before,  the 
product  —P^L^^Pi^iWj  can  be  computed  by  solving  the  problem 


(3.11) 


and  taking  the  values  of  the  solution  at  the  first  row  of  grid  points  above  F,-.  By  using  separation 
of  variables,  it  can  be  shown  [4]  that  the  solution  to  (3.11)  at  each  grid  point  (s,t)  is  given  by: 


Aav  =  0 

in 

n. 

v  =  0 

on 

dCii  n  an 

o 

II 

on 

r,-i 

V  —  Wj 

on 

r. 

v{sh,  th)  =  (cir^  +  cjrL)  sin  sjirh. 


(3.12) 


where  r.^.  and  r_  are  the  roots  of  the  characteristic  polynomial  corresponding  to  substituting  (3.12) 
in  (3.11),  namely,  _ 


r+  =  l  +  ^+ + 


(3.13) 


with  cry  given  by  (3.5).  The  constants  ci  and  cj  are  determined  by  the  boundary  conditions  and 
they  are  given  by 


Cl  =  -■ 


.mi+l  _mi+l 


-  rl" 


_mi+l 


—  r_ 

Therefore,  we  can  compute  the  product  —P^fLJ'^Pi^Wj  by  taking  (  =  1  in  (3.12),  i.e. 


where 


r_ 


(3.14) 

(3.15) 


Since  r.^r_  «  1,  7y  s  ri  and  therefore  (3.6)  and  (3.15)  are  equivalent.  By  a  similar  computation 
we  can  prove  that 

_  _  _ 

-f-  ~ 


“  ^  i 


(3.16) 


Finally,  it  can  also  be  verified  that 
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Lk+i+iWj  =  (-2  -  <Ty)u;y. 


(3.17) 


Combining  (3.14), (3.16)  and  (3.17)  we  have 


CiWj  =  XijWj 


where 


Xij  =  —2  —  etj  + 


l_^?*«+»  )  i_^;**+»  ) 


which,  after  some  simplifications,  leads  to  the  expression  (3.9). 

By  similar  arguments,  BiWj  requires  the  computation  of  the  solution  to  (3.11)  at  the  m,-th 
row  away  from  F,-  and  we  can  prove  that: 


BiWj  —  v{-,mih)  =  —SijWj 

where 


Sij  =  cir“‘  +  C2rll*‘ 


Let  us  partition  the  vectors  ur  ^<1  9  (2-6)  as: 


/«r, 's 

f  9Tx  > 

ur  = 

“Fa 

9- 

pr. 

Vffr*  > 

where,  for  t  = 


f  Ui,-  > 

f9li\ 

ur,  = 

U2, 

9Vi  = 

92i 

^  9ni  / 

The  system  (2.6)  may  be  written 

Ci«r,  +  Bjur,  =  9ri 

+  CiUVi  +  B,+i«r<+,  =  ffr.,  »  =  2, . . . , fc  -  l 
+  C'twr*  =  sr* 

Using  (3.7)  and  (3.8),  the  s>'stem  (3.18)  becomes 


/  Ai  i?2 

Z?2  ' 

Dt 

«r* 

= 

ffra 

V 

Aj.  J 

lur*  J 

(3.18) 


(3.19) 


where,  for  i  <  k, 


hi  =  V. 


(3.20) 


and 


uri  =  yVur..  (3-21) 

By  reordering  the  equations  in  (3.19),  the  system  can  be  reduced  to  the  solution  of  n  tridiagonal 
systems  of  dimension  k  and  up  can  then  be  computed  by  by  the  expression  (3.21).  Note  that  Fast 
Fourier  Transforms  can  be  used  in  computing  g  and  up.  An  outline  of  the  algorithm  follows: 


ALGORITHM  DDFPS  (  Domain-Decomposed  Fast  Poisson  Solver) 

Step  1:  Compute  right  hand  side  g  by  (2.7).  This  requires  the  solution  of  ib  -I-  1  decoupled 
problems  of  the  form  (2.1)  on  fl,-,  t  =  1, . . . ,  1;  -I- 1,  namely 


AhVi 

=  f 

in 

n. 

Vi 

=  n. 

on 

an,-  n  an 

Vi 

=  0 

on 

r.-i 

Vi 

=  0 

on 

r. 

and  then  compute 


?r.  =  hi  -  «,(.,»»,•)  -  r,+i  (.,  1) 


Step  S:  Obtain  new  right  hand  side  g  by  (3.20)  using  Fast  Fourier  l^ansforms. 
Step  S:  For  j  =  1, . . . ,  n,  solve 


/  Ai/  Sij  > 

f9ji\ 

% 

«/2 

9J2 

V  iki  ^ki 

^9jk) 

Step  4:  Fast  Fourier  Transform  the  resulting  up/s  to  obtain  up^  by  (3.21). 
Step  5:  Solve  the  following  problem  in  each  subdomain 

Afcu,  =  /  on  n,- 


(3.22) 


with  boundary  conditions 

on  dili  n  dn 
on  r,-i 
on  r,- 


«i  = 

«i  =  wr< 


3.1.  Uniform  sixe  strips 

When  m,-  =  m  for  all  t,  i.e.  all  strips  have  identical  dimensions,  all  the  blocks  C,-  in  (3.1) 
are  identical,  as  well  as  the  blocks  B,-.  In  this  case,  the  eigenvalues  and  eigenvectors  of  C  can  be 
explicitly  computed. 

7 


Lemma  S.2.  If  nii  s  m,l  <  i  <  k  +  1,  then  tJie  cAp&dt&nce  iaa,trix  has  the  form 

(Cl  B  \ 

c.  ‘ 

B 

V  B  Cl  j 

and  the  eigenvalues  of  C  are  given  by 

aij  =  A/  +  Sjcos-j^^ 

fori  =  1, ...  A:  and  j  =  1, . . .  n,  where  Xj  are  the  eigenvalues  of  Ci  and  6j  are  the  eigenvalues  of  B. 
The  eigenvectors  v,-y  are  the  direct  product  of  the  vectors  wj  deSned  in  (3.4)  and 


/  •  ^  .  .  *■!  .T 


j.e. 


Vij  =  Zi  ®  Wj  = 


({sin^)wj\ 
{  sin^  fWj 


V  (  sioi^)wj  J 

Proof.  It  can  be  verified  that 

Cvij  =  otijVij 

by  simple  substitution  and  using  the  fact  that  r,-  and  Qij  are  the  eigenvectors  and  eigenvalues  of 
the  tridagonal  matrix: 

\ 


Tj  = 


6j  XjJ 


As  an  improvement  over  the  plmn  Fourier  Analysis  method,  which  has  complexity  0[knlog2n), 
block-cyclic  reduction  techniques  can  be  combined  with  fast  Fourier  transforms  to  get  an  algorithm 
for  solving  the  interface  system  for  the  regular  sized  strips  case,  which  has  asymptotic  complexity 
0  {knlog2log2n)  [3,9]  . 

4.  Analysis  of  some  preconditioners 

As  we  mentioned  before,  the  system  (3.1)  was  previously  solved  by  the  preconditioned  conju¬ 
gate  gradient  method.  Several  preconditioners  have  been  proposed  in  order  to  improve  the  conver¬ 
gence  of  the  method.  For  the  case  where  there  b  only  one  interface  (k  =  1),  Dryja  [6]  proposed 
as  preconditioner  for  C  the  square  root  of  the  one-dimensional  discrete  Laplacian.  This  matrix, 
which  will  be  denoted  by  Mp,  can  be  inverted  by  fast  Fourier  transform,  since  it  has  the  following 
decomposition: 

Md  =  irdioff(AP,  Af ,  •  •  • ,  aP)H^^  (4.1) 
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where  the  columns  of  W  are  given  by  (3.4)  and 


(4.2) 


with  (Tj  given  by  (3.5).  Golub  and  Mayers  (7)  improve  Dryja’s  results  with  a  preconditioner  given 
by 

Mg  =  H'd»a<;(Af ,  Af ,  •  •  • ,  X^)W'^  (4.3) 

where  _ 


tr? 


(4.4) 


For  the  two  strips  case,  Bjorstad  and  Widlund  [l]  give  the  following  approximation  to  C  as  a 
preconditioner: 

Mb  =  Lz-  2P,^,Lr‘A.i 

When  the  two  strips  have  the  same  dimensions,  this  preconditioner  is  exact,  although  their  method 
involves  an  extra  solve  in  the  subdomain  Hi  in  order  to  invert  Mb,  as  opposed  to  solving  by  only 
one  fast  Fourier  transform  on  the  interface. 

In  [5],  Dryja  and  Proskurowski  propose  an  algorithm  for  the  case  of  multiple  strips.  By 
ignoring  the  off-diagonal  blocks  in  (3.1)  and  using  the  preconditioners  for  the  two-strips  case  on 
the  diagonal,  they  propose  using  either  the  block-diagonal  matrix  diag(A/£))  or  diag(JVf(7). 

Using  the  results  of  the  previous  section,  we  can  analyze  these  preconditioners.  It  is  a  well 
known  fact  that  the  rate  of  convergence  of  the  preconditioned  conjugate  gradient  method  depends 
on  the  condition  number  of  the  matrix  where  M  is  the  preconditioner,  being  close  to  one. 

We  will  use  the  2-norm  condition  number  in  thb  paper,  and  for  simplicity,  we  will  only  consider  the 
case  of  regular  sized  strips,  with  m,-  =  m.  In  [4],  Chan  shows  for  the  two-strips  case  that  K{Mq^C) 
and  K(M0^C)  depend  on  the  aspect  ratio 


and  in  particular. 


a  a=  (m-f-  l)/(n  +  1) 


1  -4- 

limA'(M-C)=^^ 


(4.5) 


(4.6) 


The  asymptotic  expression  (4.6)  is  plotted  in  Fig.  2.  As  we  can  see,  the  condition  number  will  be 
large  for  small  o. 

For  the  case  of  multiple  strips,  we  have  to  analyze  the  effect  of  dropping  the  blocks  B,-'s. 
Note  that  when  the  eigenvalues  of  B,-,  namely  6j,  are  small  compared  to  Ay,  the  system  (3.22) 
becomes  diagonal  and  therefore,  it  can  be  trivially  solved.  We  will  see  that  the  values  of  6j  are 
smadl  compared  to  Ay  when  the  aspect  ratio  (4.5)  is  large.  This  can  be  derived  from  Lemma  3.2, 
where  we  give  an  asymptotic  expression  for  the  ratio  ^  in  terms  of  the  aspect  ratio  a  when  h—*0. 

By  computing  (3.10)  and  (3.9)  for  the  regular  strips  case,  we  can  see  that 


m+1 


(4.7) 


which  is  a  decreasing  function  of  j.  Therefore, 


,  =  max  , 
Ai  i<y<n  Ay 


Moreover,  we  have: 


•  .■TrT:-w  "W  %.~w 


i 

vl 


2.5 


CMipcoi  reile  a 


Lemma  4.1. 


Figure  2:  Condition  number  of  the  preconditioned  capad- 
tance  matrix  with  Golub  and  Mayers’  preconditoner  (asymp¬ 
tote  for  h  — ♦  0  )  . 


V  ~  =  1 

h—o  Ai  +  e“** 


Proof.  Since  —*  0  when  h  — *  0,  we  have 


lim  (log'Y]"'*'^)  =  o  lim  =  -2cwr  . 

A_o'  *  •  A-.0  7J  dh 


Therefore, 

By  (4.7)  and  (4.8),  we  have 


lim  71 
A-o  * 


,m+l  ^  2a# 

1  ■*  ® 


ii. 

Ai  “  1  +  «-»«»  “  e«'  +  e“*» 


When  a  is  large  enough,  the  d/’s  can  be  ignored,  so  that  the  system  (3.1)  becomes  block- 
diagonal  and  moreover,  for  a  large,  the  eigenvalues  A,/  approach  (4.4)  and  in  that  case,  (4.3)  is  a 


figure  3:  Condition  number  of  the  preconditioned  capaci¬ 
tance  matrix  with  Golub  and  Mayers’  preconditoner  for  two, 
three,  and  up  to  eight  strips  {k  u  the  number  of  interfaces). 


good  approximation  to  C.  For  a  small,  the  condition  number  of  the  preconditioned  ci^acitance 
matrix  becomes  luge.  Fig.  3  shows  the  dependence  of  the  condition  number  on  the  aspect  ratio  a 
for  different  numbers  of  strips. 


5.  Dividing  the  domain  into  boxes 

In  the  parallel  implementation  of  domain  decompontion,  the  ratio  between  the  amount  of 
communication  and  the  amount  of  computation  required  grows  as  the  strips  become  narrower.  For 
this  reason,  it  seems  natural  to  divide  the  dommn  into  boxes. 

Boxes  introduce  a  new  difficulty,  the  cross  points.  Cross  points  are  more  difficult  to  handle, 
since  they  represent  a  strong  coupling  between  interfaces.  We  refer  the  interested  reader  to  {2]. 

One  way  around  cross  points  is  to  use  algorithm  DDFPS  recurmvely.  We  can,  for  example, 
apply  algorithm  DDFPS  to  the  domain  D  divided  into  horizontal  strips  Q,-,  and  then  solve  the 
subproblems  in  ste^  1  and  5  by  subdividing  each  strip  D,  in  vertical  strips  n^'.  Another  approadi 
is  given  by  nested  dissection,  which  can  be  described  recursively  as  follows:  the  domain  O  is  divided 
in  two  strips  Oi  and  Q2.  We  will  call  this  partition  Pi,  and  given  a  partition  Pi+i  is  obtidned 
by  subdividing  each  subdomadn  of  Pi  into  two  (vertical  or  horizont^)  strips.  An  example  of  the 
partition  procedure  is  given  in  Fig.  4  Algorithm  DDFPS  with  h  «  1  is  applied  to  Pi,  and  it  is 


Figure  4:  An  example  of  nested  dissection 


applied  recursively  in  steps  1  and  5  to  solve  for  the  problems  in  the  subdomuns.  Conventional  fast 
solvers  are  used  for  the  subproblems  in  the  finest  partition. 

6.  Concluding  remarks 

We  present  a  fast  direct  Poisson  solver  on  a  rectangle  divided  into  parallel  strips  or  boxes. 
The  method,  as  opposed  to  others,  does  not  involve  any  iteration  in  the  solution  of  the  system  for 
the  interface  variables.  It  is  especially  suited  for  parallel  implementation,  since  the  independent 
problems  in  the  subdomains  can  be  solved  in  parallel,  and  the  communication  can  be  reduced  to  k 
vectors  of  length  n,  where  n  is  the  number  of  points  in  the  interface. 

Since  the  method  is  based  on  the  fact  that  C  can  be  solved  by  FFT’s,  it  can  be  generalized 
to  other  separable  operators  with  coefficients  which  are  constant  in  the  z  or  y  direction,  and  other 
boundary  conditons. 

It  appears  that  domain  decomposition  requires  more  computation  than  a  conventional  fast 
solver  on  the  whole  domain,  since  the  algorithm  requires  two  solves  on  each  subdomain,  namely 
steps  1  and  5.  Many  operations  can  be  saved,  however,  since  only  one  row  of  the  solution  is  needed 
in  step  1  for  the  computation  of  the  right  hand  side  (2.7).  Also,  some  of  the  computations  in  step 
1  can  be  saved  for  step  5,  since  the  right  hand  side  for  the  subproblems  in  steps  1  and  5  differ 
only  at  one  or  two  rows  of  the  grid  points.  With  these  savings,  algorithm  DDFPS  can  be  made 
computationally  equivalent  to  the  fast  Poisson  solver  on  a  rectangle  using  FFT’s  in  the  z-direction 
amd  solving  n  tridiagonal  systems  in  the  ^'direction. 
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